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Abstract: Estimation methods for the Levy density of a Levy process are 
developed under mild qualitative assumptions. A classical model selection ap- 
proach made up of two steps is studied. The first step consists in the selection 
of a good estimator, from an approximating (finite-dimensional) linear model 
S for the true Levy density. The second is a data-driven selection of a linear 
model S, among a given collection {S m }m£Mt that approximately realizes the 
best trade-off between the error of estimation within S and the error incurred 
when approximating the true Levy density by the linear model S. Using recent 
concentration inequalities for functionals of Poisson integrals, a bound for the 
risk of estimation is obtained. As a byproduct, oracle inequalities and long- 
run asymptotics for spline estimators are derived. Even though the resulting 
underlying statistics are based on continuous time observations of the process, 
approximations based on high-frequency discrete-data can be easily devised. 



1. Introduction 



Levy processes are central to the classical theory of stochastic processes, not only as 
discontinuous generalizations of Brownian motion, but also as prototypical Markov 
processes and semimartingales (see 27 j and [f| for monographs on these topics) . In 



recent years, continuous-time models driven by Levy processes have received a great 
deal of attention mainly because of their applications in the area of mathematical 
finance (see e.g. [14| and references therein). The scope of these models goes from 
simple exponential Levy models (e.g. @, [nj EH and [HI), where the underlying 
source of randomness in the Black-Scholes model is replaced by a Levy process, to 
exponential time-changed Levy processes (e.g. [Hj]-[l3|) ano - ^° stochastic differen- 
tial equations driven by multivariate Levy processes (e.g. Exponential Levy 
models have proved successful to account for several empirical features observed in 
time series of financial returns such as heavy tails, high-kurtosis, and asymmetry 
(see, for example, [1(3] and [IB]). Levy processes, as models capturing the most ba- 
sic features of returns and as "first-order approximations" to other more accurate 
models, should be considered first in developing and testing a successful statistical 
methodology. However, even in such parsimonious models, there are several issues 
in performing statistical inference by standard likelihood-based methods. 

Levy processes are determined by three "parameters" : a non-negative real a 2 , 
a real /i, and a measure v on R\{0}. These three parameters characterize a Levy 
process {JT(i)} t > as the superposition of a Brownian motion with drift, aB(t)+fit, 
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and an independent pure-jump Levy process, whose jump behavior is specified 
by the measure v in that for any A G B(R), whose indicator \a vanishes in a 
neighborhood of the origin, 



v{A) = ~E 



£x„ (AX( S )) 



for any t > (see Section 19 of [23])- Here, AX(i) = - X(t~) denotes the 
jump of X at time i. Thus, v{A) gives the average number of jumps (per unit time) 
whose magnitudes fall in the set A. A common assumption in Levy-based financial 
models is that v is determined by a function p : M\{0} — > [0, oo), called the Levy 
density, as follows 

= / p(x)dx, VA e B(R\{0}). 

J A 

Intuitively, the value of p at Xq provides information on the frequency of jumps 
with sizes "close" to xq. 

Estimating the Levy density poses a nontrivial problem, even when p takes simple 
parametric forms. Parsimonious Levy densities usually produce not only intractable 
marginal densities, but sometimes marginal densities which are not even expressible 
in a closed form. The current practice of estimation relies on numerical approxi- 
mations of the density function of X(t) using inversion formulas combined with 
maximum likelihood estimation (see for instance fioj'). Such approximations make 
the estimation computationally expensive and particularly susceptible to numerical 
errors and mis-specifications. Even in the case of closed form marginal densities, 
maximum-likelihood based methods present serious numerical problems. For in- 



stance, analyzing generalized hyperbolic Levy processes, the author of 2J] notices 
that the likelihood function is highly flat for a wide range of parameters and good 
starting values as well as convergence are critical. Also, the separation of parame- 
ters and identification between different subclasses is difficult. These issues worsen 
when dealing with "high-frequency" data. Other calibration methods include meth- 
ods based on moments, simulation based methods, and multinomial log likelihoods 
(see e.g. [29[ and @ and references therein). However, our goal in the present paper 
is not to match the precision of some of these parametric methods, but rather gain 
in robustness and efficiency using non-parametric methods. That is to say, assuming 
only qualitative information on the Levy density, we develop estimation schemes 
for the Levy density p that provide fairly general function estimators p. 

We follow the so-called model selection methodology developed in the context of 
density estimation in 0] , and recently extended to the estimation of intensity func- 
tions for Poisson processes in (25|. The essence of this approach is to approximate 
an infinite-dimensional, nonparametric model by a sequence of finite-dimensional 
models. This strategy has its origins in Grenander's method of sieves (see 17]). Con- 
cretely, the procedure addresses two problems. First, the selection of a good estima- 
tor p s , called the projection estimator, out of an approximating (finite-dimensional) 
linear model S for the Levy density. Second, the selection of a linear model S m , 
among a given collection of linear models {S m } m , that approximately realizes the 
best trade-off between the error of estimation from the first step, and the error 
incurred when approximating the unknown Levy density by the linear model S. 
The technique used in the second step has the general flavor of cross-validation via 
a penalization term, leading to penalized projection estimators p (p.p.e.). 

Comparing our approach to other non-parametric methods for non-homogeneous 



Poisson processes (see e.g. [2fJ,l2l| and |25|), we will see that the main difficulty here 
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is the fact that the jump process associated with a Levy process has potentially 
infinitely many small jumps. To overcome this problem, we introduce a reference 
measure and estimate instead the Levy density with respect to this measure. In 
contrast to [1^], our treatment does not rely on the finiteness of such a reference 
measure. Our main objective here is to estimate the order of magnitude of the 
mean-square error, E||p — p\\ 2 , between the true Levy density and the p.p.e. To 
accomplish this, we apply concentration inequalities for functionals of Poisson point 
processes such as functions of stochastic Poisson integrals (see e.g. 0, 0, This 
important statistical application of concentration inequalities is well-known in other 
contexts such as regression and density estimation (see Q and references therein). 
The bound for the risk of estimation leads in turn to oracle inequalities implying 
that the p.p.e. is at least as good (in terms of the long term rate of convergence) 
as the best projection estimator (see Section |4] for details). Also, combining the 
bound with results on the approximation of smooth functions by sieves, one can 
determine the long-term rate of convergence of the p.p.e. on certain well-known 
approximating spaces of functions such as splines. 

The statistics underlying our estimators are expressed in terms of deterministic 
functions of the jumps of the process, and thus, they are intrinsically based on a 
continuous-time observation of the process during some time period [0, T]. Even 
though this observation scheme has an obvious drawback, statistical analysis under 
it presents a lot of interest for two reasons. First, very powerful theoretical results 
can be obtained, thus providing benchmarks of what can be achieved by discrete- 
data-based statistical methods. Second, since the path of the process can in principle 
be approximated by high-frequency sampling, it is possible to construct feasible 
estimators by approximating the continuous-time based statistics using discrete- 
observations. We use this last idea to obtain estimators by replacing the jumps by 
increments, based on equally spaced observations of the process. 

Let us describe the outline of the paper. We develop the model selection approach 
in Sections [2] and [3] We proceed to obtain in Section |4] bounds for the risk of 
estimation, and consequently prove oracle inequalities. In Section [5] the rate of 
convergence of the p.p.e. on regular splines, when the Levy density belongs to some 
Lipschitz spaces or Besov spaces of smooth functions, are derived. In Section 
implementation of the method using discrete-time sampling of the process is briefly 
discussed. We finish with proofs of the main results. 



2. A non-parametric estimation method 



Consider a real Levy process X — {X(t)} t>0 with Levy density p : Mo — > R+ 5 
where R-o = R\{0}. Then, A is a cadlag process with independent and stationary 
increments such that the characteristic function of its marginals is given by 



(2.1) E 
with p : Mo 
(2.2) 



iuX(t) 



exp < t ( iub 



+ {e mx - 1 - iuxl { \ x \< x] ) p{x)dx 



such that 



(1 A x 2 )p{x)dx < oo. 



Since A is a cadlag process, the set of its jump times 

{t > : AA(i) = X(t) - X(t~) ^ 0} 
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is countable. Moreover, for Borel subsets B of [0, oo) x Ro, 

(2.3) J(B) = # {t > : (i, X(t) - X(t")) € 5} , 

is a well-defined random measure on [0, oo) x Ro, where # denotes cardinality. The 
Levy-Ito decomposition of the sample paths (see Theorem 19.2 of [13]) implies that 
i7 is a Poisson process on the Borel sets B([0, oo) x Ro) with mean measure 

(2.4) fj,(B) =[[ p{x)dtdx. 



t<T 



Recall also that the stochastic integral of a deterministic function / : Ro — > R with 
respect to J is defined by 

(2.5) /(/)= jj f(x)J{dt,dx) = Y.tt AX( t))> 

[0,T]xR o 

where this last expression is well defined if 
fT r 

\f(x)\fi(dt,dx) =T I \f(x)\p(x)dx < oo; 

J Mo 

see e.g. Chapter 10 in [l9j ] . 

We consider the problem of estimating the Levy density p on a Borel set D € 
B (Ro) using a projection estimation approach. According to this paradigm, p is 
estimated by estimating its best approximating function in a finite-dimensional 
linear space iS. The linear space S is taken so that it has good approximation 
properties for general classes of functions. Typical choices are piecewise polynomials 
or wavelets. Throughout, we make the following standing assumption. 

Assumption 1. The Levy measure v(dx) = p{x)dx is absolutely continuous with 
respect to a known measure rj on B (D) so that the Radon-Nikodym derivative 

(2.6) ^ (a;) = s(x), x e D, 

drj 

is positive, bounded, and satisfies 

(2.7) / s z (x)ri(dx) < oo. 



2/ 

5 

D 



In that case, s is called the Levy density, on D, of the process with respect to the 
reference measure r\. 

Remark 2.1. Under the previous assumption, the measure J of (|2.3p . when re- 
stricted to B([Q, oo) x D), is a Poisson process with mean measure 

(2.8) K B )= 1 1 s(x)dtr](dx), B e £>([0, oo) x D). 



Our goal will be to estimate the Levy density s, which itself could in turn be 
used to retrieve p on D via (|2.6p . To illustrate this strategy consider a continuous 
Levy density p such that 

p(x) — O (x^ 1 ) , as i^O. 
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This type of densities satisfies the above assumption with respect to the measure 
rj(dx) = x~ 2 dx on domains of the form D — {x : Q <\x\ < b}. Clearly, an estimator 
p for the Levy density p can be generated from an estimator s for s by fixing 
p(x) = x~ 2 s(x). 

Let us now describe the main ingredients of our approach. Let S be a hnite 
dimensional subspace of L 2 = L 2 ((D,n)) equipped with the standard norm 



\\fV= nx)rj(dx). 



D 



The space S plays the role of an approximating linear model for the Levy density s. 
Of course, under the L 2 norm, the best approximation of s on S is the orthogonal 
projection defined by 



(2.9) s ± (x) = <Pi(.V)*(vWdv)) <Pi(x), 

where {ipi, ■ ■ ■ , fd} is an arbitrary orthonormal basis of S. The projection estimator 
of s on S is defined by 

d 

(2.10) 

where we fix 

(2.11) k = \ J J Vi{x)J{dt,dx). 

[0,T]xD 

This is the most natural unbiased estimator for the orthogonal projection s . No- 
tice also that s is independent of the specific orthonormal basis of S. Indeed, the 
projection estimator is the unique solution to the minimization problem 

m in 7i 3 (/), 
where 7 D : L 2 ((D, rf)) — > K is given by 

(2.12) 7d (/) = _| J! f(x)J(dt,dx)+ I f 2 (x) V (dx). 

[Q,T]xD D 

In the literature on model selection (see e.g. 0] and L 25]), 7^3 is the so-called contrast 
function. The previous characterization also provides a mechanism to numerically 
evaluate s when an orthonormal basis of S is not explicitly available. 

The following proposition provides both the first-order and the second-order 
properties of s. These follow directly from the well-known formulas for the mean 
and variance of Poisson integrals (see e.g. [l9[ Chapter 10). 

Proposition 2.2. Under Assumption^ s is an unbiased estimator for s and its 
"mean-square error", defined by 

x 2 ^\\s-sX= [ (s(x) ~ ^(x)f v(dx), 
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is such that 
(2.13) 



E 



1 d f 

W 2 ] = J D t Pi( x ) s ( x )'n(da 



The risk o/ s admits the decomposition 



(2.14) 



E 



„_L ||2 



E[ X 2 



The first term in ([2.140 . the bias term, accounts for the distance of the unknown 
function s to the model S, while the second term, the variance term, measures the 
error of estimation within the linear model S. Notice that (|2.13[) is finite because s 
is assumed bounded on D and thus, 



(2.15) 



e[x 2 ] <IN 



T 



3. Model selection via penalized projection estimator 

A crucial issue in the above approach is the selection of the approximating linear 
model S. In principle, a "nice" density s can be approximated closely by general 
linear models such as splines or wavelet. However, a more robust model <S' con- 
taining S will result in a better approximation of s, but with a larger variance. 
This raises the natural problem of selecting one model, out of a collection of linear 
models {S m ,m € M.}, that approximately realizes the best trade-off between the 
risk of estimation within the model and the distance of the unknown Levy density 
to the approximating model. 

Let s m and be respectively the projection estimator and the orthogonal pro- 
jection of s on S m . The following equation, readily derived from (|2.14|) . gives insight 
on a sensible solution to the model selection problem: 



(3.1) 



E 



■E 



pen(m) 



Here, pen(m) is defined in terms of an orthonormal basis {ifi_ m , ■ ■ ■ , Vd m ,m} °f <Sr? 
by the equation: 



(3.2) 



pen(m) 




)j(dt,dx). 



[0,T]xD 



Equation (|3.1[) shows that the risk of s m moves "parallel" to the expectation of 
the observable statistics —\\s m \\ + pen(m). This fact justifies to choose the model 
that minimizes such statistics. We will see later that other choices for pen(-) also 
give good results. Therefore, given a penalization function pen : M. — > [0, oo), we 
consider estimators of the form 



(3.3) s — s m , 

where s m is the projection estimator on S m and 



m = argmm meA1 



{-I 



pen 



(to) | . 
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An estimator s as in (|3.3[) is called a penalized projection estimator (p.p.e.) 
on the collection of linear models {S m ,m € A4}. 

Methods of estimation based on the minimization of penalty functions have a long 
history in the literature on regression and density estimation (for instance, 0, [22| , 
and [28(). The general idea is to choose, among a given collection of parametric 
models, the model that minimizes a loss function plus a penalty term that controls 
the variance term, which will forcefully increase as the approximating linear models 
become more detailed. Such penalized estimation was promoted for nonparametric 
density estimation in [8| , and in the context of non-homogeneous Poisson processes 
in 



4. Risk bound and oracle inequalities 



The penalization idea of the previous section provides a sensible criterion to select 
an estimator s = s m out of the projection estimators {s m : m 6 A4} induced by a 
given collection of approximating linear models {S mi m e VW}. Ideally, one wishes 
to choose that projection estimator s m * that minimizes the risk; namely, such that 



(4.1) 



E 



< E 



for all m S Ad. 



Of course, to pick the best s m is not feasible since s is not available to actually 
compute and compare the risks. But, how bad would the risk of s be compared 
to the best possible risk that can be achieved by projection estimators? One can 
aspire to achieve the smallest possible risk up to a constant. In other words, it is 
desirable that our estimator s comply with an inequality of the form 



(4.2) 



E 



Is-s|| 2 



< C inf E 



for a constant C independent of the linear models. The model iS m * that achieves 
the minimal risk (using projection estimation) is the oracle model and inequalities 
of the type (14. 2|) are called oracle inequalities. Approximate oracle inequalities were 
proved in |2a | for the intensity function of a nonhomogeneous Poisson process. In 
this section we show that for certain penalization functions, the resulting penalized 
projection estimator s defined by (|3.3p satisfies the inequality 



(4.3) 



Sll 2 



< C inf E 



q_ 



for some "model free" constants C, C (remember that the time period of observa- 
tions is [0, T]). The main tool in obtaining oracle inequalities is an upper bound for 
the risk of the penalized projection estimator s. The proof of (|4.3[) follows essentially 
from the arguments in [251 ]; however, to overcome the possible lack of finiteness on 
the reference measure r\ (see Assumption [1]), which is required in [251 ] . and to avoid 
superfluous rough upper bounds, the dimension of the linear model is explicitly 
included in the penalization and the arguments are refined. 

Let us introduce some notation. Below, d m denotes the dimension of the linear 
model S m , and {<pi jm , . . . , (pd m . m } is an arbitrary orthonormal basis of S m . Define 



(4.4) 



Dn 



sup 



{\\f\\l>'-fes m , \\f\\l = 1} 



which is assumed to be finite and can be proved to be equal to || J2i=i r 
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We make the following regularity condition, introduced in (25|, that essentially 
controls the complexity of the linear models. This assumption is satisfied by splines 
and trigonometric polynomials, but not by wavelet bases. 

Assumption 2. There exist constants T > and R > such that for every positive 
integer n, 

#{m£M: d m =n}< Tn R . 

We now present our main result. 

Theorem 4.1. Let {S m ,m € M} be a family of finite dimensional linear subspaces 
of L 2 ((D,rj)) satisfying Assumption^ and such that D m < oo. Let Mt = {m € 
M ■ D m < T}. Lf s m and are respectively the projection estimator and the 
orthogonal projection of the Levy density s on S m then, the penalized projection 
estimator s T on {S m } meMT defined by (|3.3j) is such that 

(4.5) E[|| a -5 T ||»1 <C inf {|| s - + E [pen(m)]} + ^, 

whenever pen : M — > [0, oo) takes either one of the following forms for some fixed 
(but arbitrary) constants c > 1, d > 0, and c" > 0: 

(a) pen(m) > c D p^ + c / % t , where JV = l/([0, T] x D) is the number of jumps 
prior to T with sizes in D and where it is assumed that p = J" s(x)rj(dx) < oo; 

(b) pen(m) > cXf^, where V m is defined by 
(4-6) Vm=\ II (f^flmi^ J(dt,dx), 



T 

[0 : T]xD 



and where it is assumed that (3 = inf mS ^vi j^" 1 ^ > and that (j> = inf me x S^- > 0; 

(c) pen(m) > c% + d^f- + J'fy. 

Ln (14. 5p . the constant C depends only on c, c' and c" , while C varies with c, c 1 , 
c", T, R, \\s\\ v , Hslloo, p, (3. andcj). 

Remark 4.2. It can be shown that if e > 2, then for arbitrary e > 0, there is a 
constant C'{e) (increasing as e J, 0) such that 



(4.7) v\\ s -sf <(l + s) inf l \\\s-sif n +E\pen(m)}} + 



C'(e) 
T ' 



One important consequence of the risk bound (|4. 5[) is the following oracle in- 
equality: 



Corollary 4.3. Ln the setting of Theorem 14.1T b). if the penalty function is of the 
form pen(m,) = c-^-, for every m £ Mt, > 0, and <j) > 0, then 

(4.8) E[|| S -s T |H <C inf U\\\ S -s m f\] + ^, 

for a constant C depending only on c, and a constant C depending on c, T, R, 
IML ||s||oo, P, and <fi. 
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5. Rate of convergence for smooth Levy densities 

We use the risk bound of the previous section to study the "long run" (T — » oo) 
rate of convergence of penalized projection estimators based on regular piecewise 
polynomials, when the Levy density is "smooth" . More precisely, on a window of 
estimation D = [a,b] C M. , the Levy density of the process with respect to the 
Lebesgue measure rj(dx) = dx, denoted by s, is assumed to belong to the Besov 
space (also called Lipschitz space) (L p ([a, b})) for some p £ [2, oo] and a > (see 
for instance [l5[ and references therein for background on these spaces) . Concretely, 
(L p ([a, b])) consists of those functions / £ L p ([a,b],dx) if < p < oo (or / 
continuous if p = oo) such that 

I/Ib«(2>) = SU P^ SU P \\ A h(fr)L Hla , bhdx) < oo, 

5>0 0<h<6 

where Ah(f,x) = f(x + h) — f(x) and A r h (f,x) is the r tfl -order difference of / 
defined by 

Al(f,x) = A h (A r - 1 (f r ),x), 

for x such that x + rh £ D and r £ N. The following spaces are closely related. 
For k £ N and [3 £ (0, 1] such that a = k + 0, let Lip(a, L p ([a, 6])) be the class of 
functions / such that /,..., f( k ~^ are absolutely continuous on [a, 6] with £ 
L p ((a,b)) satisfying 

\\A h (f ik \-)\\ LPi[aM , dlc) <Mh^ 

for some M < oo. It is know that if a > is not an integer and 1 < p < oo, then 
/ £ Lip(a, L p ([a, b])) if and only if / is a.e. equal to a function in B^ (L p ([a, b])). 
In general, Lip(a, L p ([a, b])) C B^ (L p ([a, b})), for any < p < oo and a > (see 
e.g. 01). 

An important reason for the choice of the Besov class of smooth functions is the 
availability of estimates for the error of approximation by splines, trigonometric 
polynomials, and wavelets (see e.g. [IH and [3]). In particular, if denotes the 
space of piecewise polynomials of degree at most k, based on the regular partition 
of [a, b] with m intervals (m > 1), and s £ B^ (L p ([a, b})) with k > a — 1, then 
there exists a constant C(s) such that 

(5.1) dp (a, 5*) <C(s)m' a , 

where d p is the distance induced by the L p -norm on ([a, b],dx) (see jl5j). The 
following gives the rate of convergence of the p.p.e. on regular splines. 



Corollary 5.1. With the notation of Theorem \4 ■ 1\ taking D = [a,b] and f]{dx) = 
dx , let s T be the penalized projection estimator on {5^} m gM T with penalization 

/ \ , / D m d m 

pen(m) = c— + c — + c — , 

for some fixed c > 1 and c', c" > 0. Then, if the restriction to D of the Levy density 
s belongs to B^ (L p ([a, b])), with 2 < p < oo and < a < k + 1, then 



limsupT 2Q /( 2Q+1 )l 



I* -5 Jl 2 



< oo. 



Moreover, for any R > and L > 0, 

< oo, 



(5.2) limsupT 2a/(2a+1) sup E 

T— oo see(R,L) 



\s — s^W 2 
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where L) consists of all the Levy densities f such that ||/||z,™>([ a ,6],dx) < R> an d 
such that the restriction of f to [a, b] is a member ofB^ (L p ([a, b])) with |/|ga < 
L. 

The previous result implies that the p.p.e. on regular splines has a rate of con- 
vergence of order j i - 2q /( 2q + 1 ) f or the class of Besov Levy densities 0(R, L). 

6. Estimation based on discrete time data 

Let us finish with some remarks on how to approximate the continuous-time statis- 
tics of our methods using only discrete-time observations. In practice, we can aspire 
to sample the process X(t) at discrete times, but we are neither able to measure the 
size of the jumps AX(t) = X(t)—X(t~) nor the times of the jumps {t : AX(t) > 0}. 
In general, Poisson integrals of the type 

(6.1) /(/)= J J f(x)J(dt,dx)=J2f(&X(t))> 

[l)J]xl t - T 

are not accessible. Intuitively, the following statistic is the most natural approxi- 
mation to (16.11): 



(6.2) J n (/) = ^/(A fc X), 

fc=i 

where A^X is the k th increment of the process with time span h n = T/n; that is, 

A k X = X (kh n ) - X ((k - l)h n ) , k = l,...,n. 

How good is this approximation and in what sense? Under some conditions on /, 
we can readily prove the weak convergence of (|6.2|) to (|6.1|) using properties of the 
transition distributions of X in small time (see [H[, Corollary 8.9 of [13], and [Ifjjl). 
The following theorem summarizes some known results on the small-time transition 
distribution. 

Theorem 6.1. Let X — {X(t)} t>0 be a Levy process with Levy measure v. The 
following statements hold true. 

(1) For each a > 0, 

(6.3) lim-P(X(t) > a) = v([a,oo)), 

(6.4) lim -P (X(t) < -a) = v((-oc, -a}). 

(2) For any continuous bounded function h vanishing in a neighborhood of the 
origin, 

(6.5) ]im-E[h(X(t))]= I h{x)v{dx). 

(3) If h is continuous and bounded and if lim^i^g h(x)\x\~ 2 = 0. then 

Um±E[h(X(t))]= f h{x)v{dx). 

Moreover, if / K (|x| A l)v{dx) < oo, it suffices to have h{x)(\x\ A 1) _1 contin- 
uous and bounded. 
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Convergence results like (|6.5I) are useful to establish the convergence in distribu- 
tion of I n (/) since 



E 



E 



>/(*(£)) 



n 



where a n = n& [h (X (|))] with = e m f^ - 1. So, if / is such that 



(6.6) 



lim -E 

t->o i 



- l] = / (e iunx) ~ l) i/(di), 



then a n converges to a = T J K h(x)v(dx), and thus 



lim U + ^V = lim e " lo s( 1+ ^) = , 

71 — >oo 



n 



a 



n — >oo 



We thus have the following result. 

Proposition 6.2. Let X = {X(t)} t>0 be a Levy process with Levy measure v. 
Then, 



lim E 



= exp < T 



i/ / satisfies either one of the following conditions: 

(1) /(x) = l( a j,](x)h(x) for an interval [a,b] C Mo and a continuous function h; 

(2) / is continuous on M.q and lim^i^o f( x )\ x \ = 0. 

Ln particular, I n (f) converges in distribution to 1(f) under any one of the previous 
two conditions. 



Remark 6.3. Notice that if (|6 . 5[) holds true when replacing h by f and f 2 , then 
the mean and variance of L n (f) obey the asymptotics: 



lim E [/„(/)] = T / f(x)v(dx); 



lim Var [/„(/)] =T [ f 2 (x)v(dx). 



Remark 6.4. Very recently, [23[ proposed a procedure to disentangle the jumps 
from the diffusion part in the case of jump-diffusion models driven by finite-jump 
activity Levy processes. It is proved there that for certain functions r : R + — > 
R+, there exists N(io) such that for n > N(lo), a jump occurs in the interval 
((k — l)h n ,kh n ] if and only if (A^X) 2 > r(h n ). Here, h n = T/n and AkX is the 
k th increment of the process. These results suggest to use statistics of the form 



^/(A fe X)l \(A k X) 2 >r(h n ) 



k=l 



instead of (|6.2p to approximate the integral (|6.1[) . 
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7. Proofs 

7. 1 . Proof of the risk bound 

We break the proof of Theorem 14.11 into several preliminary results. 



Lemma 7.1. For any penalty function pen : M. 
penalized projection estimator s satisfies 



[0, oo ) and any m £ M, the 



(7.1) 



Is - sll 2 < lis 



„-L l|2 



where xL = II s^ - s r 



s m\\; + 2 X m + {s m - s^) + pen(m) - pen(m), 
2 and where the functional vd ■ L 2 ((£>,?/)) — > K is defined 



by 

(7.2) 



'77 



Mf) = 



J(dt, dx) — s(x) dt rj{dx) 
f 



[0,T]xD 



The general idea in obtaining (|4.5[) is to bound the "inaccessible" terms on the 
right hand side of (|7.ip (namely \ m an d (s m — s^)) by observable statistics. 
In fact, the penalizations pcn(-) given in Theorem 14. II are chosen so that the right 
hand side in (JT7TJ) does not involve m. To carry out this plan, we use concentration 
inequalities for \ m and for the compensated Poisson integrals vjj(f). The following 
result gives a concentration inequality for general compensated Poisson integrals. 

Proposition 7.2. Let N be a Poisson process on a measurable space (V,V) with 
mean measure \i and let f : V — » R be an essentially bounded measurable function 
satisfying < ||/|| 2 = J v f 2 (v)[i(dv) and J y \f(v)\ii{dv) < 00. Then, for any u > , 



(7.3) 



1 



f(v)(N(dv) - n(dv)) > ||/|| <( V2«+-||/|| 00 « 



< e' 



In particular, if f : V — * [0, 00) then, for any e > and u > 0, 
(7.4) 

1 5 s 



(1 



f(v)N(dv) 



2e 6 



l/lloou ^ / ffrMdv) 



> 1 



For a proof of the inequality (|7.3p . see [25[ (Proposition 7) or [18| (Corollary 
5.1). Inequality (|7.4p is a direct consequence of f|7.3|) (see Section I7T21 for a proof). 

The next result allows us to bound the Poisson functional x m - This result is 
essentially Proposition 9 of [25] . 

Lemma 7.3. Let N be a Poisson process on a measurable space (V, V) with mean 
measure fi(dv) = p(v)((dv) and intensity function p £ L 2 (V,V,(). Let S be a finite 
dimensional subspace of L 2 (V, V, C) with orthonormal basis {<p%, ■ . . , fid}, and let 

d / r 

(fi(w)N(dw) ) (f>i(v) 



(7.5) 
(7.6) 



(w)<pi{w)rj(dw) <Pi{v). 



Then, X 2 (£) = \\P ~ P^W^ is such that for any u > and e > 
(7.7) P \x(S) > (1 + e) VE [ x 2 (5)] + V2fcM 5 u + fc(e)B 5 u 



< e" 
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where we can take k = 6, k(e) = 1.25 + 32/ e, and where 

(7.8) M S = sup U f(v)p(v)adv) :feS, \\f\\ ( = 1 

(7.9) Ss = sup{||/||ao:/G5,||/|| c = l}. 

Following the same strategy as in [25j, the idea is to obtain from the previous 
lemmas a concentration inequality of the form 

> 1 - c' e -« 



s-S\\ 2 <C(\\s-st\\ 2 +pen(m))+h(Z) 

for constants C and C , and a function h(£) (all independent of to). This will prove 
to be enough in view of the following elementary result (see Section [7751 for a proof). 

Lemma 7.4. Let h : [0, oo) — * K + be a strictly increasing function with continuous 
derivative such that h(0) = and lim^oo e~*h(£) — 0. If Z is random variable 
satisfying 

P[Z> /i(£>] < Ke~ $ , 

for every £ > 0, then 

POO 

:<../ • l\ / e- u h{u)du. 
Jo 

We are now in a position to prove Theorem 14. H Throughout the proof, we 
will have to introduce various constants and inequalities that will hold with high 
probability. In order to clarify the role that the constants play in these inequalities, 
we shall make some convention and give to the letters x, y, /, a, 6, £, /C, c, and C, 
with various sub- or superscripts, special meaning. The letters with x are reserved to 
denote positive constants that can be chosen arbitrarily. The letters with y denote 
arbitrary constants greater than 1. /, /i,/2, ••■ denote quadratic polynomials of 
the variable £ whose coefficients (denoted by a's and b's) are determined by the 
values of the x' s and y' s. The inequalities will be true with probabilities greater 
that 1 — /Ce~^, where K, is determined by the values of the x's and the y's. Finally, 
c's and C's are used to denote constants constrained by the x's and y's. It is 
important to remember that the constants in a given inequality are meant only for 
that inequality. The pair of equivalent inequalities below will be repeatedly invoked 
throughout the proof: 

, . (i) 2ab < xa 2 + \b 2 , and 

{ '- W) (ii) (a + b) 2 < (l + x)a 2 + (l + i)& 2 , (for.T>0). 

Also, for simplicity, we write below \\ ■ \\ to denote the L 2 — norm with respect to the 
reference measure n. 

Proof of Theorem 14.11 We consider successive improvements of the inequality 

Inequality 1. For any positive constants x\, X2, x%, and x^, there exist a positive 
number K, and an increasing quadratic function f (both independent of the family 
of linear models and ofT) such that, with probability larger than 1 — JCe~^ , 



\s — s\ 



112 < \\s~-si\\ 2 + 2 X 2 h + 2x 1 \\si 



/_-.-. n Dm Dm d m 

(7-11) +X 2 —+X 3 —+X 4 — 

+ pcn(m) — pcn(m) + ^P-. 
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Proof. Let us find an upper bound for ur)(si, — si), to', m € Ai. Since the operator 
vd defined by (|7.2[) is just a compensated integral with respect to a Poisson process 
with mean measure [i(dtdx) — dtrj{dx) 1 we can apply Proposition 17.21 to obtain 
that, for any x' m , > 0, and with probability larger than 1 — e~ x ™' 



(7.12) v D (s^ - si) < 



s ± ,-s 1 - 

m' m 



T 



In i , \\ s m' s m\\°o X m' 



In that case, the probability that (|7.12j) holds for every m' E M is larger than 
1 — Ylm'eM e ~ Xm ' because P(A fl B) > 1 — a — b, whenever P(A) > 1 — a and 
P(B) > 1 - b. Clearly, 



,-L _ ,-L 2 
m' m 



T 



S m'( X ) s m( x ) 



T 



[0,T]xD 

<NlJ 



s(x)dtrj(dx) 



s^, - s-Hl 2 

a m' °m\\ 



T 



Using (|7.10[) (i). the first term on the right hand side of (IT. 12|) is then bounded as 
follows: 



(7.13) 



_ ,-L 



T 



2x', <xx\\s 



± ,-s ± \\ 2 

m' Tn II 



2Tx! 



for any x\ > 0. Using ([44]) and ifTM i). 



\s ± , - s 1 - 



lloo^m' — (ll s m'l|oo + || s ml|oo) x m ' 

< (v^7||^,|I + Va^II^I|)^ 



< V D m '\\s\\ x 'm' + y/D^\\s\\x' m , 

12 \x 2 x 3 



< 3x 2 D m > + 3x 3 D 

n 



for all x-i > 0, X3 > 0. It follows that, for any X\ > 0, x-i > 0, and x 3 > 0, 



v D {s^ - si) < xx\\si, ~ si\\ 2 + 



X2- 



T 



x 3 - 



~T~ 



2T Xl 



36Tx ' 



where we set 4 = — + — . Next, take 

X X 2 X 3 



1 1 

I4V dm 1 | "H n A jj- 



Then, for any positive x\, x%, x 3 , and £4, there is a K. and a function / such that, 
with probability greater than 1 — /Ce~^, 
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Concretely, 
(7.15) 



f(£) = iiliU 2 + 1101100 £ 



c=r §" Sexp (-^(H A Pc)) 



Here, we used the assumption of polynomial models (Definition to come up with 
the constant JC. Plugging (|7.14p in (|7.ip . and renaming the coefficient of d m > /T, we 
can corroborate inequality 1. □ 

Inequality 2. For any positive constants y\ > 1, x%, X3, and X4, there are positive 
constants C\ < 1, C[ > 1, and /C, and a strictly increasing quadratic polynomial f 
(all independent of the class of linear models and of T) such that with probability 
larger than 1 — K.e~^ , 



C 1 \\s-sf<C' 1 \\s-si\\ 2 +y lX l 

Xi 

f(0 



(7-16) + x 2 — + x 3 — + X4,— 



pen(m) — pen(m) 



T 



Moreover, if 1 < y\ < 2, then C[ = 3 — y\ and C\ = y% — 1. If y\ > 2, then 
C[ = 1 + 4a;i and Ci = 1 — 4xi, where x\ is any positive constant related to f via 
to the equation (|7.15p . 

Proof. Let us combine the term on the left hand side of (17. lip with the first three 
terms on the right hand side. Using the triangle inequality followed by (|7.101 ii). 

||4-^|| 2 <2|| S -^|| 2 + 2||4- S || 2 . 

Then, since x m = 114 - 5 m|| 2 , and \\s m - s\\ 2 = \\s-s m \\ 2 - \\s^-s m \\ 2 , it follows 
that 

\\s - si\\ 2 + 2x1 + 20*114 ~ sif \\s - sf 

< (1 + 4xi) \\s - sif + (2 - 4xi) ||4 - s m \\ 2 
+ (4x 1 ~l)\\s-§\\ 2 , 

for every x\ > 0. Then, for any y\ > 1, there are positive constants C > 0, C[ > 1, 
and Ci < 1 such that 



r7 17 , Ik - 4ll 2 + 2x1 + 2C||4 - 4ll 2 - 1!* - s|| a 

Combining <f7TTT|) and ([7^7]) . we obtain (|7TT6| . □ 



Inequality 3. For any 2/2 > 1 and positive constants Xi, i = 2,3,4, there exist 
positive reals C\ < 1, C[ > 1, an increasing quadratic polynomial of the form 
/2(f) = a£ 2 + b£, and a constant IC2 > (all independent of the family of linear 
models and ofT) so that, with probability greater than 1 — Yiie~^ , 

c 1 ||.s-s|| 2 <^|| s -4ll 2 

11 1 o\ H?i f til "jti / - \ 

(7-18) +y 2 — +2:2— +a; 3 — -pen(m) 

+ 2:4— + pcn(m) + — -. 
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Proof. We bound x m ' using Lemma [731 with V = M.+ x D and n{dx) = s(x)dtr](dx). 
We regard the linear model S m as a subspace of L 2 (M. + x D, dtr](dx)) with orthonor- 
mal basis {t^ti ■ ■ ■ , y< ^ m }. Recall that 

1 2 

SJ(dt,dx) — s(x)dtrj(dx) 



X.n 



E 



T 



[0,T]xD 



Then, with probability larger than 1 — ^2 m , 



m'eM 



(7.19) y/fxm' < (1 + X t )y/Vm> + ^2kM m ,x' m , + k( Xl )B m >x' m , , 

for every m' G M., where B m ' = y/ D m > JT, 



^Vlmix) s(x)T](dx), 



M m , = sup\ / f 2 (x)s(x)r){dx) : / e S m >, ||/|| = 1 



(7.20) 



Now, by Cauchy-Schwarz J D f 2 (x)s(x)r](dx) < ||/||oo||s||, when ||/|| = 1, and so the 
constant M m i above is bounded by ||s||V-Dro'- hi that case, we can use (|7.10M ) to 
obtain 

fcllsll , 



2kM m >x' m , < x 2 

for any x 2 > 0. On the other hand, by hypothesis D m ' < T, and (|7.19[) implies that 

'fcllsll 



VT X ,n> < (1 + X 1 )^/V^> + X 2 \[D m ~> + 

Choosing the constant x' , as 

/ %3\^ dm' 



2x 2 



+ fc(^i) x' m , 



we get that for any X\ > 0, x 2 > 0, x% > 0, and £ > 
(7.21) 



T\rn' < (1 + Il)VV + X 2 \J D m > + X 3 y/d m ' + 

with probability larger than 1 — ICie~^, where 

fcllsll 



MO 



(7.22) 



2x 2 

OO 



+ k( Xl ) £ 



/Ci = r n fl cxp ( —\fnx-ij 



n=l 



k\\s\\ 
2x 2 



k(x{ 



Squaring (|7.21[) and using (|7.10l -ii) repeatedly, we conclude that, for any y > 1, 
x 2 > 0, and x 3 > 0, there exist both a constant JCi > and a quadratic function 
of the form f 2 (£) = a£ 2 (independent of T, m', and of the family of linear models) 
such that, with probability greater than 1 — /Cie - *, 



(7.23) 



v;. 



Xm' < 2/— +^2" j, 



dm' f2(0 w , ^ . . 



Then, ([7TTS|) immediately follows from ([?T2"3"j) and ([77TB)) . 



□ 



112 J. E. Figueroa-Lopez and C. Houdre 

Proof of H4-5\ ) for the case (c). By the inequality (|7.4p . we can upper bound V m i 
by V m i on an event of large probability. Namely, for every x' m , > and x > 0, with 
probability greater than 1 — Ylm'eM e ~ Xm ' i 

(7.24) (1 + x) [v m . + (± + ^V m ') > V m >, Vm' e M, 

(recall that D m = || 5Zj=i mll°°)- Since by hypothesis Z) m / < T, and choosing 

x' m ' = x ' d m' + £, {x' > 0), 

it is seen that for any x > and X4 > 0, there exist a positive constant IC2 and 
a function /(£) = o£ (independent of T and of the linear models) such that with 
probability greater than 1 — K,2e~^ 

(7.25) (1 + x)V m , + x 4 d m , + /(£) > F m ,, Vm' € M. 

Here, we get IC 2 from the polynomial assumption on the class of models. Combining 
(|7.25p and (|7.18[) . it is clear that for any y 2 > 1, and positive Xi, i — 1,2,3, we 
can choose a pair of positive constants C\ < 1, C( > 1, an increasing quadratic 
polynomial of the form /(£) = a£ 2 + 6£, and a constant /C > (all independent of 
the family of linear models and of T) so that, with probability greater than 1 — JCe~^ 

CiWa-SfKCiWs-tif 

(7.26) + V2— + xi— + x 2 — - pen(m) 

, D m /(£) 
+ x 3 — + pen(m) + -j r . 

Next, we take y% = c, x\ = c', and x 2 = c" to cancel — pen(m) in (17.26|) . By Lemma 
ITU it follows that 



(7.27) CiE - S|| 2 ] < C[\\s - s m \\ 2 + (l + ?f) E [pen(m)] 1 1 



d J 11 T 

Since m is arbitrary, we obtain the case (c) of (|4. 5|) . □ 



Proof of |^.5[ j /or i/ie cose fa). One can bound V^', as given in (I7.20p . by D m ip 
(assuming that p < 00). On the other hand, (|7.4|) implies that 

(7-28) (1+Il) ^ + (J_ + |UU A 



T \2xx 6 J T 

with probability greater than 1 — e~^. Using these bounds for V m ' and the assump- 
tion that D m < < T, d7~18|) reduces to 

(7-29) + y^- + Xl d f pcn(m) 

+ £2^- + pen(m) + 

which is valid with probability 1 — Ke~^. In (|7.29[) . y > 1, x\ > and X2 > are 
arbitrary, while Ci, C{, the increasing quadratic polynomial of the form /(£) = 
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a£ 2 + b£, and a constant K. > are determined by y, x\, and X2 independently of 
the family of linear models and of T. We point out that we divided and multiplied 
by p the terms D m /T and D m /T in ([7TT8]) . and then applied (|7T28]) to get ([739]) . 
It is now clear that y = c, and x\ = c' will produce the desired cancelation. □ 

Proof of for the case (b). We first upper bound D m by /3~~ 1 V m and d m by 
(/3^) _1 K?7 in the inequality (f7TT8|) : 



(7.30) 



Cil|s - s|| 2 < Cl||« - 4J| 2 + (y + + *2 (/ty)" 1 ) ^ 

/" a \ i o— 1 ^™ i / \ I / (0 

— pen(m) + a^p h pcn(m) + 



T T 

Then, using d m ' < ((3(j))~ 1 V m ' in (|7.25|) and letting a; 4 (/3^)) _1 vary between and 
1, we verify that for any x' > 0, a positive constant /C4 and a polynomial / can be 
found so that with probability greater than 1 — K,4e~^, 

(7.31) {l + x')V m ,+f{§>V m ,, Vm'eM. 

Putting together (|7.3ip and (|7.30[) . it is clear that for any y > 1 and x\ > 0, we can 
find a pair of positive constants C\ < 1, C[ > 1, an increasing quadratic polynomial 
of the form /(£) = a£ 2 + 6£, and a constant JC > (all independent of the family 
of linear models and of T) so that, with probability greater than 1 — /Ce - ^, 



(?32) Cxh- S|| 2 < C[\\s- sif + y^ - pen(m) 

+ pen(m) + 



In particular, by taking y = c, the term —pen(rh) cancels out. Lemma 17.41 implies 
that 

(7.33) C X E [\\s - ,5|| 2 ] < Ci||a - s^|| 2 + (1 + x x ) E [pen(m)] + ^. 

Finally, f|4. 5|) (b) follows since to is arbitrary. □ 

Remark 7.5. Let us analyze more carefully the values that the constants C and 
C can take in the inequality (|4.5|) . For instance, consider the penalty function of 
part (c). As we saw in (|7.27p . the constants C and C are determined by Ci, C[, C'{, 
and X3. The constant Ci was proved to be y% — 1 if 1 < y% < 2, while it can be made 
arbitrarily close to one otherwise (see the comment immediately after (|7.16j) ). On 
the other hand, yi itself can be made arbitrarily close to the penalization parameter 
c since c = 2/2 = + x )Ui where x is as in (|7.24p and y is in (|7.23|) . Then, when 
c > 2, Ci can be made arbitrarily close to one at the cost of increasing G'{ in (|7.27|) . 
Similarly, paying a similar cost, we are able to select C[ as close to one as we wish 
and X3 arbitrarily small. Therefore, it is possible to find for any e > 0, a constant 
C'(e) (increasing in e) so that 

(7.34) E|| S -S|| 2 <(l + £ ) inf {|| s -4j| 2 +E[pen(TO)]} + ^. 

A more thorough inspection shows that 

lim C'(e)e = K, 

where K depends only c, c', c", T, R, ||s||, and ||s||oo- The same reasoning applies 
to the other two types of penalty functions when c > 2. In particular, we point out 
that C can be made arbitrarily close to 2 in the oracle inequality (14. 8p at the price 
of having a large constant C . 
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7.2. Some additional proofs 



Proof of Corollary I5.il The idea is to estimate the bias and the penalized term 
in (14. 5p . Clearly, the dimension d m of 5^ is m{k + 1). Also, D m is bounded by 
(k + l) 2 m/(b - a) (see (7) in Q), and 




<plm(, x ) s(x)dx <{k + l)m\\s\ 



V rr . 



since the functions <Pi <m are orthonormal. On the other hand, by (10.1) in Chapter 
2 of [IH, if s € (L p ([a, 6])), there is a polynomial q £ such that 

\\s-q\\lp < C[ a ]\s\ B ^ LP) (b - a) a m~ a . 

Thus, 

||s - s^|| < c H (6 - a) 3 ~* +a |s| Bg < (j^m - ". 

By (|4.5|) ). there is a constant M (depending on C, c, c', c", a, fc, 6 — a, p, |s|b<« 
and || s|| oo) , for which 

E[|| S - S ~ T f]<M inf + !?!}+<£. 

hiGMt L 1)1 

It is not hard to see that, for large enough T, the infimum on the above right hand 
side is O a {T^ 2a ^ 2a+1 ^) (where O a means that the ratio of the terms is bounded 
by a constant depending only on a). Since M is monotone in |s|g<» i^ps and ||s||oo, 
flOl) is verified. □ 



Proof of Lemma \7.1\ Let 



(7.35) 7D (/) = _|. // f( x )J{dt,dx) + / / 2 (x)r/(dx), 



T 

[0,T]xD 



which is well defined for any function / 6 L 2 ((D,r))), where 13 6 S(Ko) and 
?y is as in (|2.6p - (|2.8p . The projection estimator is the unique minimizer of the 
contrast function jjj over S. Indeed, plugging / = 2»=i PiPi m (|7.35p gives 
Mf) = Eti(- 2 ftA + Pf), and thus, 7fl (/) > - ^ti Pi all / € 5. Clearly, 

7u(/) = Il/H 2 - 2(/, «) - 2^(/) = ||/ - S || 2 - |H| 2 - 2u D {f). 

By the very definition of s, as the penalized projection estimator, 

Jd(s) + pcn(m) < j D (s m ) + pen(m) < 7_d(s„) + pen(m), 

for any m E A4. Using the above results, 

\\§ - s\\ 2 = lD (§) + \\s\\ 2 + 2v D {§) 

< l( s m) + ll s l| 2 + ^d(S) + pen(m) - pen(m) 
= ||s m — s|| 2 + 2v D {s — s^ n ) + pen(m) — pen(m). 

Finally, notice that Uo(s—s^) = v D {s- s^) + Vd{s^ — s^) and that vr){s m -s^) = 

Xrn- □ 
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Proof of inequality |7.^[j. Just note that for any a,b,e > 0: 



(7.36) fl _^_l 6 >_£__(^ + «J 6 . 

Evaluating the integral in (|7.3| for — /, we can write 

1 



> 1 - e" 



f(x)N(da)> / /(aOM<fc)-||/tV2u--||/|Lu 
x Jx 

Using < H/Hoo/x and lead to 

/(^)iV(^) ^J^- £ J x f Wtidx) - + |) ll/lloo« > 1 - e-», 

which is precisely the inequality (17. 4j) . □ 
Proof of Lemma\7J\ Let Z + be the positive part of Z. First, 



/>oo 

E [Z] < E [Z+] = / P[Z > x]dx. 
Jo 



Since /i is continuous and strictly increasing, F[Z > x] < Kexp(—h 1 {x)), where 
h^ 1 is the inverse of h. Then, changing variables to u — h~~ l (x), 

/•OO /"OO /"OO 

/ F[Z>x]dx<K e~- h ~ 1{x) dx = K e- u h'{u)du. 
Jo Jo Jo 

Finally, an integration by parts yields J °° e~ u h' \u)du = J °° h(u)e~ u du. □ 
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